A Scaffold Analysis Tool Using Mate-Pair Information in Genome Sequencing
نویسندگان
چکیده
We have developed a Windows-based program, ConPath, as a scaffold analyzer. ConPath constructs scaffolds by ordering and orienting separate sequence contigs by exploiting the mate-pair information between contig-pairs. Our algorithm builds directed graphs from link information and traverses them to find the longest acyclic graphs. Using end read pairs of fixed-sized mate-pair libraries, ConPath determines relative orientations of all contigs, estimates the gap size of each adjacent contig pair, and reports wrong assembly information by validating orientations and gap sizes. We have utilized ConPath in more than 10 microbial genome projects, including Mannheimia succiniciproducens and Vibro vulnificus, where we verified contig assembly and identified several erroneous contigs using the four types of error defined in ConPath. Also, ConPath supports some convenient features and viewers that permit investigation of each contig in detail; these include contig viewer, scaffold viewer, edge information list, mate-pair list, and the printing of complex scaffold structures.
منابع مشابه
NxRepair: error correction in de novo sequence assembly using Nextera mate pairs
Scaffolding errors and incorrect repeat disambiguation during de novo assembly can result in large scale misassemblies in draft genomes. Nextera mate pair sequencing data provide additional information to resolve assembly ambiguities during scaffolding. Here, we introduce NxRepair, an open source toolkit for error correction in de novo assemblies that uses Nextera mate pair libraries to identif...
متن کاملClinical Application of Liquid Biopsy and Mate Pair Next Generation Sequencing for Oropharyngeal Cancer Patients
Oropharyngeal Cancer Patients Sarah Clark Biochemistry, Biology, Chemistry Background: Circulating tumor DNA (ctDNA) can be distinguished from other cell-free DNA in the body due to its unique mutations. Next generation sequencing and digital PCR have allowed detection of ctDNA to become a more commonly used tool, and because it can be detected directly out of blood, is called the liquid biopsy...
متن کاملScaffolding and validation of bacterial genome assemblies using optical restriction maps
MOTIVATION New, high-throughput sequencing technologies have made it feasible to cheaply generate vast amounts of sequence information from a genome of interest. The computational reconstruction of the complete sequence of a genome is complicated by specific features of these new sequencing technologies, such as the short length of the sequencing reads and absence of mate-pair information. In t...
متن کاملTheoretical Bounds on Mate-Pair Information for Accurate Genome Assembly
Over the past two decades, a series of works have aimed at studying the problem of genome assembly: the process of reconstructing a genome from sequence reads. An early formulation of the genome assembly problem showed that genome reconstruction is NP-hard when framed as finding the shortest sequence that contains all observed reads. Although this original formulation is very simplistic and doe...
متن کاملOptimization and cost-saving in tagmentation-based mate-pair library preparation and sequencing.
In de novo genome sequencing, mate-pair reads are crucial for scaffolding assembled contigs. However, preparation of mate-pair libraries is not a trivial task, even when using one of the latest approaches, the Nextera Mate Pair Sample Prep Kit from Illumina. To reduce cost and enhance library yield and fidelity when using this kit, we have modified the manufacturer's protocol based on (i) varia...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Journal of Biomedicine and Biotechnology
دوره 2008 شماره
صفحات -
تاریخ انتشار 2008